Applying Policy Iteration for Training Recurrent Neural Networks

نویسندگان

  • István Szita
  • András Lörincz
چکیده

Recurrent neural networks are often used for learning time-series data. Based on a few assumptions we model this learning task as a minimization problem of a nonlinear least-squares cost function. The special structure of the cost function allows us to build a connection to reinforcement learning. We exploit this connection and derive a convergent, policy iteration-based algorithm. Furthermore, we argue that RNN training can be fit naturally into the reinforcement learning framework. recurrent neural networks, policy iteration, sequence learning, reinforcement learning

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

PIRANHA: Policy iteration for recurrent artificial neural networks with hidden activities

It is an intriguing task to develop efficient connectionist representations for learning long time series. Recurrent neural networks have great promises here. We model the learning task as a minimization problem of a nonlinear leastsquares cost function, that takes into account both one-step and multi-step prediction errors. The special structure of the cost function is constructed to build a b...

متن کامل

Neuro-Optimizer: A New Artificial Intelligent Optimization Tool and Its Application for Robot Optimal Controller Design

The main objective of this paper is to introduce a new intelligent optimization technique that uses a predictioncorrectionstrategy supported by a recurrent neural network for finding a near optimal solution of a givenobjective function. Recently there have been attempts for using artificial neural networks (ANNs) in optimizationproblems and some types of ANNs such as Hopfield network and Boltzm...

متن کامل

Soft Value Iteration Networks for Planetary Rover Path Planning

Value iteration networks are an approximation of the value iteration (VI) algorithm implemented with convolutional neural networks to make VI fully differentiable. In this work, we study these networks in the context of robot motion planning, with a focus on applications to planetary rovers. The key challenging task in learningbased motion planning is to learn a transformation from terrain obse...

متن کامل

Handwritten Character Recognition using Modified Gradient Descent Technique of Neural Networks and Representation of Conjugate Descent for Training Patterns

The purpose of this study is to analyze the performance of Back propagation algorithm with changing training patterns and the second momentum term in feed forward neural networks. This analysis is conducted on 250 different words of three small letters from the English alphabet. These words are presented to two vertical segmentation programs which are designed in MATLAB and based on portions (1...

متن کامل

An Empirical Comparison of Neural Architectures for Reinforcement Learning in Partially Observable Environments

This paper explores the performance of fitted neural Q iteration for reinforcement learning in several partially observable environments, using three recurrent neural network architectures: Long ShortTerm Memory [7], Gated Recurrent Unit [3] and MUT1, a recurrent neural architecture evolved from a pool of several thousands candidate architectures [8]. A variant of fitted Q iteration, based on A...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

عنوان ژورنال:
  • CoRR

دوره cs.AI/0410004  شماره 

صفحات  -

تاریخ انتشار 2004